{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "OkpPD_RfNYkg"
   },
   "source": [
    "# **Unstructured RAG**\n",
    "Unstructured or (Semi-Structured) RAG is a method designed to handle documents that combine text, tables, and images. It addresses challenges like broken tables caused by text splitting and the difficulty of embedding tables for semantic search.\n",
    "\n",
    "Here we are using unstructured.io to parse and separate text, tables, and images.\n",
    "\n",
    "Tool Reference: [Unstructured](https://unstructured.io/)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "urjXoWDk9rg5"
   },
   "source": [
    "## **Initial Setup**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "oYh10OBF6sUe"
   },
   "outputs": [],
   "source": [
    "! pip install --q athina faiss-gpu pytesseract unstructured-client \"unstructured[all-docs]\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "F0bKsHlVQqqc"
   },
   "outputs": [],
   "source": [
    "!apt-get install poppler-utils\n",
    "!apt-get install tesseract-ocr\n",
    "!apt-get install libtesseract-dev"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "8AHlmfoP6t21"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "from google.colab import userdata\n",
    "os.environ[\"OPENAI_API_KEY\"] = userdata.get('OPENAI_API_KEY')\n",
    "os.environ['ATHINA_API_KEY'] = userdata.get('ATHINA_API_KEY')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "9ObH1Z5s9svo"
   },
   "source": [
    "## **Indexing**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "id": "Ougq8dbPFvO4"
   },
   "outputs": [],
   "source": [
    "# load embedding model\n",
    "from langchain_openai import OpenAIEmbeddings\n",
    "embeddings = OpenAIEmbeddings()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "03zjZ3WF7LoE"
   },
   "outputs": [],
   "source": [
    " # load and extract images, tables, and chunk text\n",
    "from unstructured.partition.pdf import partition_pdf\n",
    "\n",
    "filename = \"/content/sample.pdf\"\n",
    "\n",
    "pdf_elements = partition_pdf(\n",
    "    filename=filename,\n",
    "    extract_images_in_pdf=True,\n",
    "    strategy = \"hi_res\",\n",
    "    hi_res_model_name=\"yolox\",\n",
    "    infer_table_structure=True,\n",
    "    chunking_strategy=\"by_title\",\n",
    "    max_characters=3000,\n",
    "    combine_text_under_n_chars=200,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "QHk-K9voHeea",
    "outputId": "1e1246b1-9e3e-48a5-a101-8530f5ce4ae0"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Counter({\"<class 'unstructured.documents.elements.CompositeElement'>\": 14,\n",
       "         \"<class 'unstructured.documents.elements.TableChunk'>\": 2})"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# check unique categories\n",
    "from collections import Counter\n",
    "category_counts = Counter(str(type(element)) for element in pdf_elements)\n",
    "unique_categories = set(category_counts)\n",
    "category_counts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "E7jI_njBE5kj",
    "outputId": "9f9eea79-a762-4c7a-b431-cc97869d66d9"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'CompositeElement', 'Table'}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# extract unique types\n",
    "unique_types = {el.to_dict()['type'] for el in pdf_elements}\n",
    "unique_types"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "id": "aZXVtfYfKUJz"
   },
   "outputs": [],
   "source": [
    "# # display images from pdf\n",
    "# from IPython.display import Image, display\n",
    "# image_files = os.listdir('/content/figures')\n",
    "# image_files = [os.path.join('/content/figures', image_file) for image_file in image_files]\n",
    "\n",
    "# for image_file in image_files:\n",
    "#     display(Image(filename=image_file))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "id": "ArQxTM8pHDWl"
   },
   "outputs": [],
   "source": [
    "# convert pdf_elements to langchain documents\n",
    "from langchain.schema import Document\n",
    "documents = [Document(page_content=el.text, metadata={\"source\": filename}) for el in pdf_elements]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "NLvE3-BXNMde"
   },
   "source": [
    "## **Vector Store**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "id": "R-4GZaxzGx9E"
   },
   "outputs": [],
   "source": [
    "# create vectorstore\n",
    "from langchain.vectorstores import FAISS\n",
    "vectorstore = FAISS.from_documents(documents, embeddings)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "VRiczRDVNKzR"
   },
   "source": [
    "## **Retriever**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "id": "YlliENVAGczz"
   },
   "outputs": [],
   "source": [
    "# create retriever\n",
    "retriever = vectorstore.as_retriever()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KbYdQT9xNKRT"
   },
   "source": [
    "## **RAG Chain**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "id": "c6CIteJzNWCk"
   },
   "outputs": [],
   "source": [
    "# load llm\n",
    "from langchain_openai import ChatOpenAI\n",
    "llm = ChatOpenAI()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "id": "r7LT4JSCNYEM"
   },
   "outputs": [],
   "source": [
    "# create document chain\n",
    "from langchain.prompts import ChatPromptTemplate\n",
    "from langchain.schema.runnable import RunnablePassthrough\n",
    "from langchain.schema.output_parser import StrOutputParser\n",
    "\n",
    "template = \"\"\"\"\n",
    "You are a helpful assistant that answers questions based on the provided context, which can include text and tables.\n",
    "Use the provided context to answer the question.\n",
    "Question: {input}\n",
    "Context: {context}\n",
    "Answer:\n",
    "\"\"\"\n",
    "prompt = ChatPromptTemplate.from_template(template)\n",
    "\n",
    "# Setup RAG pipeline\n",
    "rag_chain = (\n",
    "    {\"context\": retriever,  \"input\": RunnablePassthrough()}\n",
    "    | prompt\n",
    "    | llm\n",
    "    | StrOutputParser()\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 105
    },
    "id": "1v4A3yzHNbJ-",
    "outputId": "0a960628-82a8-4159-867a-27b8fb1d9107"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "type": "string"
      },
      "text/plain": [
       "'To compare all the Training Results on the MATH Test Set, we can look at the results from Table 6 in the provided context. The results are as follows:\\n\\n- deepseek-sft-abel:\\n   - SFT-phase1: 0.372\\n   - SFT-phase2-shortcutLearning: 0.386\\n   - SFT-phase2-journeyLearining: 0.470\\n   - DPO: 0.472\\n\\n- deepseek-sft-prm800k:\\n   - SFT-phase1: 0.290\\n   - SFT-phase2-shortcutLearning: 0.348\\n   - SFT-phase2-journeyLearining: 0.428\\n   - DPO: 0.440\\n\\nBased on these results, we can see that Journey Learning led to significant improvements compared to Shortcut Learning on both models, with gains of +8.4 and +8.0 on deepseek-sft-abel and deepseek-sft-prm800k, respectively. The DPO results were also provided for comparison.'"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# response\n",
    "response = rag_chain.invoke(\"Compare all the Training Results on MATH Test Set\")\n",
    "response"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "lq3KgKOKPi-J"
   },
   "source": [
    "## **Preparing Data for Evaluation**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "YlaS-vyfPx1G",
    "outputId": "e428c7fb-e611-4df5-f989-628d3ffadc06"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.10/dist-packages/langchain_core/_api/deprecation.py:119: LangChainDeprecationWarning: The method `BaseRetriever.get_relevant_documents` was deprecated in langchain-core 0.1.46 and will be removed in 0.3.0. Use invoke instead.\n",
      "  warn_deprecated(\n"
     ]
    }
   ],
   "source": [
    "# create dataset\n",
    "question = [\"Compare all the Training Results on MATH Test Set\"]\n",
    "response = []\n",
    "contexts = []\n",
    "\n",
    "# Inference\n",
    "for query in question:\n",
    "  response.append(rag_chain.invoke(query))\n",
    "  contexts.append([docs.page_content for docs in retriever.get_relevant_documents(query)])\n",
    "\n",
    "# To dict\n",
    "data = {\n",
    "    \"query\": question,\n",
    "    \"response\": response,\n",
    "    \"context\": contexts,\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "id": "yJwNmnwwQFcZ"
   },
   "outputs": [],
   "source": [
    "# create dataset\n",
    "from datasets import Dataset\n",
    "dataset = Dataset.from_dict(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "id": "ht2xXRiTQHBu"
   },
   "outputs": [],
   "source": [
    "# create dataframe\n",
    "import pandas as pd\n",
    "df = pd.DataFrame(dataset)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 89
    },
    "id": "nZDNxwXbQI8K",
    "outputId": "da52093d-2e02-40d0-8f68-842892259bc2"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "summary": "{\n  \"name\": \"df\",\n  \"rows\": 1,\n  \"fields\": [\n    {\n      \"column\": \"query\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 1,\n        \"samples\": [\n          \"Compare all the Training Results on MATH Test Set\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"response\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 1,\n        \"samples\": [\n          \"To compare all the Training Results on the MATH Test Set, we can look at the results provided in Table 6 from the context. The results for the different models on the MATH test set are as follows:\\n\\n- deepseek-sft-abel: SFT-phase1 = 0.372, SFT-phase2-shortcutLearning = 0.386, SFT-phase2-journeyLearining = 0.470, DPO = 0.472\\n- deepseek-sft-prm800k: SFT-phase1 = 0.290, SFT-phase2-shortcutLearning = 0.348, SFT-phase2-journeyLearining = 0.428, DPO = 0.440\\n\\nFrom these results, we can see that the Journey Learning method led to significant improvements compared to Shortcut Learning for both models on the MATH test set. The gains were +8.4 and +8.0 for the deepseek-sft-abel and deepseek-sft-prm800k models, respectively. The improvement from DPO was more modest in comparison.\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"context\",\n      \"properties\": {\n        \"dtype\": \"object\",\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}",
       "type": "dataframe",
       "variable_name": "df"
      },
      "text/html": [
       "\n",
       "  <div id=\"df-bdb1e7f3-3804-4b7f-87f3-d92aee274663\" class=\"colab-df-container\">\n",
       "    <div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>query</th>\n",
       "      <th>response</th>\n",
       "      <th>context</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Compare all the Training Results on MATH Test Set</td>\n",
       "      <td>To compare all the Training Results on the MAT...</td>\n",
       "      <td>[The results of our experiments are shown in T...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>\n",
       "    <div class=\"colab-df-buttons\">\n",
       "\n",
       "  <div class=\"colab-df-container\">\n",
       "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-bdb1e7f3-3804-4b7f-87f3-d92aee274663')\"\n",
       "            title=\"Convert this dataframe to an interactive table.\"\n",
       "            style=\"display:none;\">\n",
       "\n",
       "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
       "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
       "  </svg>\n",
       "    </button>\n",
       "\n",
       "  <style>\n",
       "    .colab-df-container {\n",
       "      display:flex;\n",
       "      gap: 12px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert {\n",
       "      background-color: #E8F0FE;\n",
       "      border: none;\n",
       "      border-radius: 50%;\n",
       "      cursor: pointer;\n",
       "      display: none;\n",
       "      fill: #1967D2;\n",
       "      height: 32px;\n",
       "      padding: 0 0 0 0;\n",
       "      width: 32px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert:hover {\n",
       "      background-color: #E2EBFA;\n",
       "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
       "      fill: #174EA6;\n",
       "    }\n",
       "\n",
       "    .colab-df-buttons div {\n",
       "      margin-bottom: 4px;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert {\n",
       "      background-color: #3B4455;\n",
       "      fill: #D2E3FC;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert:hover {\n",
       "      background-color: #434B5C;\n",
       "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
       "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
       "      fill: #FFFFFF;\n",
       "    }\n",
       "  </style>\n",
       "\n",
       "    <script>\n",
       "      const buttonEl =\n",
       "        document.querySelector('#df-bdb1e7f3-3804-4b7f-87f3-d92aee274663 button.colab-df-convert');\n",
       "      buttonEl.style.display =\n",
       "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
       "\n",
       "      async function convertToInteractive(key) {\n",
       "        const element = document.querySelector('#df-bdb1e7f3-3804-4b7f-87f3-d92aee274663');\n",
       "        const dataTable =\n",
       "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
       "                                                    [key], {});\n",
       "        if (!dataTable) return;\n",
       "\n",
       "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
       "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
       "          + ' to learn more about interactive tables.';\n",
       "        element.innerHTML = '';\n",
       "        dataTable['output_type'] = 'display_data';\n",
       "        await google.colab.output.renderOutput(dataTable, element);\n",
       "        const docLink = document.createElement('div');\n",
       "        docLink.innerHTML = docLinkHtml;\n",
       "        element.appendChild(docLink);\n",
       "      }\n",
       "    </script>\n",
       "  </div>\n",
       "\n",
       "\n",
       "  <div id=\"id_5489877d-7a23-40d4-8baa-bb5a9cb3fee6\">\n",
       "    <style>\n",
       "      .colab-df-generate {\n",
       "        background-color: #E8F0FE;\n",
       "        border: none;\n",
       "        border-radius: 50%;\n",
       "        cursor: pointer;\n",
       "        display: none;\n",
       "        fill: #1967D2;\n",
       "        height: 32px;\n",
       "        padding: 0 0 0 0;\n",
       "        width: 32px;\n",
       "      }\n",
       "\n",
       "      .colab-df-generate:hover {\n",
       "        background-color: #E2EBFA;\n",
       "        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
       "        fill: #174EA6;\n",
       "      }\n",
       "\n",
       "      [theme=dark] .colab-df-generate {\n",
       "        background-color: #3B4455;\n",
       "        fill: #D2E3FC;\n",
       "      }\n",
       "\n",
       "      [theme=dark] .colab-df-generate:hover {\n",
       "        background-color: #434B5C;\n",
       "        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
       "        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
       "        fill: #FFFFFF;\n",
       "      }\n",
       "    </style>\n",
       "    <button class=\"colab-df-generate\" onclick=\"generateWithVariable('df')\"\n",
       "            title=\"Generate code using this dataframe.\"\n",
       "            style=\"display:none;\">\n",
       "\n",
       "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
       "       width=\"24px\">\n",
       "    <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
       "  </svg>\n",
       "    </button>\n",
       "    <script>\n",
       "      (() => {\n",
       "      const buttonEl =\n",
       "        document.querySelector('#id_5489877d-7a23-40d4-8baa-bb5a9cb3fee6 button.colab-df-generate');\n",
       "      buttonEl.style.display =\n",
       "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
       "\n",
       "      buttonEl.onclick = () => {\n",
       "        google.colab.notebook.generateWithVariable('df');\n",
       "      }\n",
       "      })();\n",
       "    </script>\n",
       "  </div>\n",
       "\n",
       "    </div>\n",
       "  </div>\n"
      ],
      "text/plain": [
       "                                               query  \\\n",
       "0  Compare all the Training Results on MATH Test Set   \n",
       "\n",
       "                                            response  \\\n",
       "0  To compare all the Training Results on the MAT...   \n",
       "\n",
       "                                             context  \n",
       "0  [The results of our experiments are shown in T...  "
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "id": "GnKSiZ1yQVQu"
   },
   "outputs": [],
   "source": [
    "# Convert to dictionary\n",
    "df_dict = df.to_dict(orient='records')\n",
    "\n",
    "# Convert context to list\n",
    "for record in df_dict:\n",
    "    if not isinstance(record.get('context'), list):\n",
    "        if record.get('context') is None:\n",
    "            record['context'] = []\n",
    "        else:\n",
    "            record['context'] = [record['context']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Gt_JpGdnQa5v"
   },
   "source": [
    "## **Evaluation in Athina AI**\n",
    "\n",
    "We will use **Does Response Answer Query** eval here. It Checks if the response answer the user's query. To learn more about this. Please refer to our [documentation](https://docs.athina.ai/api-reference/evals/preset-evals/overview) for further details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "i_JqLJ6eQbaZ",
    "outputId": "1e4ccb88-1a88-48a7-b809-64d9832fd15c"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py:547: UserWarning: <built-in function any> is not a Python type (it may be an instance of an object), Pydantic will allow any object with no validation since we cannot even enforce that the input is an instance of the given type. To get rid of this error wrap the type with `pydantic.SkipValidation`.\n",
      "  warn(\n"
     ]
    }
   ],
   "source": [
    "# set api keys for Athina evals\n",
    "from athina.keys import AthinaApiKey, OpenAiApiKey\n",
    "OpenAiApiKey.set_key(os.getenv('OPENAI_API_KEY'))\n",
    "AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "id": "8vd7LKlWQfUg"
   },
   "outputs": [],
   "source": [
    "# load dataset\n",
    "from athina.loaders import Loader\n",
    "dataset = Loader().load_dict(df_dict)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 445
    },
    "id": "lvTOKtOSQhKk",
    "outputId": "61f63043-a5b8-4e79-f97d-55fdea34e231"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "You can view your dataset at: https://app.athina.ai/develop/e5dec38c-c58c-412d-b910-588d97ccd090\n"
     ]
    },
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "repr_error": "Out of range float values are not JSON compliant: nan",
       "type": "dataframe"
      },
      "text/html": [
       "\n",
       "  <div id=\"df-aaf37b24-7383-49ac-bdb0-dc829b6af3a2\" class=\"colab-df-container\">\n",
       "    <div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>query</th>\n",
       "      <th>context</th>\n",
       "      <th>response</th>\n",
       "      <th>expected_response</th>\n",
       "      <th>display_name</th>\n",
       "      <th>failed</th>\n",
       "      <th>grade_reason</th>\n",
       "      <th>runtime</th>\n",
       "      <th>model</th>\n",
       "      <th>passed</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Compare all the Training Results on MATH Test Set</td>\n",
       "      <td>[The results of our experiments are shown in Table 6. All results are tested on the MATH test set, using a re-divided subset from PRM800K, which includes 500 examples. The results show that Journey Learning led to significant improvements compared to Shortcut Learning, with gains of +8.4 and +8.0 on the deepseek-sft-abel and deepseek-sft-prm800k models, respectively, demonstrating the effectiveness of our proposed Journey Learning method. However, the improvement from DPO was more modest, an...</td>\n",
       "      <td>To compare all the Training Results on the MATH Test Set, we can look at the results provided in Table 6 from the context. The results for the different models on the MATH test set are as follows:\\n\\n- deepseek-sft-abel: SFT-phase1 = 0.372, SFT-phase2-shortcutLearning = 0.386, SFT-phase2-journeyLearining = 0.470, DPO = 0.472\\n- deepseek-sft-prm800k: SFT-phase1 = 0.290, SFT-phase2-shortcutLearning = 0.348, SFT-phase2-journeyLearining = 0.428, DPO = 0.440\\n\\nFrom these results, we can see that...</td>\n",
       "      <td>None</td>\n",
       "      <td>Does Response Answer Query</td>\n",
       "      <td>False</td>\n",
       "      <td>The response provides a detailed comparison of the training results on the MATH test set for two models, deepseek-sft-abel and deepseek-sft-prm800k. It includes specific performance metrics for different phases and methods, such as SFT-phase1, SFT-phase2-shortcutLearning, SFT-phase2-journeyLearning, and DPO. Additionally, it highlights the improvements observed with the Journey Learning method compared to Shortcut Learning, which directly addresses the user's query about comparing training r...</td>\n",
       "      <td>1910</td>\n",
       "      <td>gpt-4o</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>\n",
       "    <div class=\"colab-df-buttons\">\n",
       "\n",
       "  <div class=\"colab-df-container\">\n",
       "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-aaf37b24-7383-49ac-bdb0-dc829b6af3a2')\"\n",
       "            title=\"Convert this dataframe to an interactive table.\"\n",
       "            style=\"display:none;\">\n",
       "\n",
       "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
       "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
       "  </svg>\n",
       "    </button>\n",
       "\n",
       "  <style>\n",
       "    .colab-df-container {\n",
       "      display:flex;\n",
       "      gap: 12px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert {\n",
       "      background-color: #E8F0FE;\n",
       "      border: none;\n",
       "      border-radius: 50%;\n",
       "      cursor: pointer;\n",
       "      display: none;\n",
       "      fill: #1967D2;\n",
       "      height: 32px;\n",
       "      padding: 0 0 0 0;\n",
       "      width: 32px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert:hover {\n",
       "      background-color: #E2EBFA;\n",
       "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
       "      fill: #174EA6;\n",
       "    }\n",
       "\n",
       "    .colab-df-buttons div {\n",
       "      margin-bottom: 4px;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert {\n",
       "      background-color: #3B4455;\n",
       "      fill: #D2E3FC;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert:hover {\n",
       "      background-color: #434B5C;\n",
       "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
       "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
       "      fill: #FFFFFF;\n",
       "    }\n",
       "  </style>\n",
       "\n",
       "    <script>\n",
       "      const buttonEl =\n",
       "        document.querySelector('#df-aaf37b24-7383-49ac-bdb0-dc829b6af3a2 button.colab-df-convert');\n",
       "      buttonEl.style.display =\n",
       "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
       "\n",
       "      async function convertToInteractive(key) {\n",
       "        const element = document.querySelector('#df-aaf37b24-7383-49ac-bdb0-dc829b6af3a2');\n",
       "        const dataTable =\n",
       "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
       "                                                    [key], {});\n",
       "        if (!dataTable) return;\n",
       "\n",
       "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
       "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
       "          + ' to learn more about interactive tables.';\n",
       "        element.innerHTML = '';\n",
       "        dataTable['output_type'] = 'display_data';\n",
       "        await google.colab.output.renderOutput(dataTable, element);\n",
       "        const docLink = document.createElement('div');\n",
       "        docLink.innerHTML = docLinkHtml;\n",
       "        element.appendChild(docLink);\n",
       "      }\n",
       "    </script>\n",
       "  </div>\n",
       "\n",
       "\n",
       "    </div>\n",
       "  </div>\n"
      ],
      "text/plain": [
       "                                               query  \\\n",
       "0  Compare all the Training Results on MATH Test Set   \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               context  \\\n",
       "0  [The results of our experiments are shown in Table 6. All results are tested on the MATH test set, using a re-divided subset from PRM800K, which includes 500 examples. The results show that Journey Learning led to significant improvements compared to Shortcut Learning, with gains of +8.4 and +8.0 on the deepseek-sft-abel and deepseek-sft-prm800k models, respectively, demonstrating the effectiveness of our proposed Journey Learning method. However, the improvement from DPO was more modest, an...   \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              response  \\\n",
       "0  To compare all the Training Results on the MATH Test Set, we can look at the results provided in Table 6 from the context. The results for the different models on the MATH test set are as follows:\\n\\n- deepseek-sft-abel: SFT-phase1 = 0.372, SFT-phase2-shortcutLearning = 0.386, SFT-phase2-journeyLearining = 0.470, DPO = 0.472\\n- deepseek-sft-prm800k: SFT-phase1 = 0.290, SFT-phase2-shortcutLearning = 0.348, SFT-phase2-journeyLearining = 0.428, DPO = 0.440\\n\\nFrom these results, we can see that...   \n",
       "\n",
       "  expected_response                display_name  failed  \\\n",
       "0              None  Does Response Answer Query   False   \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          grade_reason  \\\n",
       "0  The response provides a detailed comparison of the training results on the MATH test set for two models, deepseek-sft-abel and deepseek-sft-prm800k. It includes specific performance metrics for different phases and methods, such as SFT-phase1, SFT-phase2-shortcutLearning, SFT-phase2-journeyLearning, and DPO. Additionally, it highlights the improvements observed with the Journey Learning method compared to Shortcut Learning, which directly addresses the user's query about comparing training r...   \n",
       "\n",
       "   runtime   model  passed  \n",
       "0     1910  gpt-4o     1.0  "
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# evaluate\n",
    "from athina.evals import DoesResponseAnswerQuery\n",
    "DoesResponseAnswerQuery(model=\"gpt-4o\").run_batch(data=dataset).to_df()"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
